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1 [0001] This patent application claims priority from a provisional patent application entitled ''System 

2 and Method for Analyzing and Describing Electronic Data, and generating Major and Minor Variant 

3 Samples of Electronic Data," Serial No. 60/314,715, having a filing date of 8-24-01. This patent 

4 application is also a continuation in part of another utility patent application entitled "System and 

5 Method for Conductmg Electronic Commerce," Serial No. 09/767,442 having a filing date of 1-1 9- 

6 01. 

7 FIELD OF THE INVENTION 

8 [0002] The present invention relates generally to a system and method of analyzing electronic data 

9 and, more particularly, to a system and method of determining the inherent structure of one or more 
1 0C«^ incoming data files and generating output data for use in retrieving or testing electronic data. 

w 

1^1 BACKGROUND OF THE INVENTION 

n 

1 X [0003] The need for the efficient analysis of electronic data has become increasingly important as 

14iJ reliance upon computer systems has increased. Electronic data, regardless of its type, benefits firom 

15^4 descriptive information capable of identifying and characterizing the individual data elements of 

,0 

Iff" incoming data files. To illustrate, information describing the starting position, length, delimiting 

1 7 character, etc., of individual data elements allows a database management system or other system to 

1 8 more efficiently read and utilize an incoming data file. 
19 

20 [0004] The analysis of electronic data files requires descriptive information, whether found within 

21 the data file or sourced externally, to identify and describe each data element. For example, 

22 descriptive information allows database management systems to more efficiently extract data, extract 

23 specific subsets of data, convert identified data into other formats, import data firom other systems 
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1 and/or prepare external systems to utilize the incoming data file. If descriptive information is not 

2 available, efficient use of the incoming data file is extremely difficult. 
3 

4 [0005] Typically, known systems utilize defined data formats to provide descriptive information for 

5 electronic data files. Defined data formats, such as xBase, Excel, EDI or XML, contain descriptive 

6 information which may be used to identify individual data elements of an incoming data file. Data 

7 files having a defined format are typically referred to as structured files. 
8 

9 [0006] Although equipped with a predefined format, structured files such as EDI and XML are often 

1 OCJ equipped with one or more implementation guides. These implementation guides provide additional 

' i 

1 ly descriptive information for each element of the structured data file. 

13^ [0007] Some electronic data is produced without the benefit of a predefined format. This type of 

1 4p J electronic data is referred to as semi-structured data and is typically organized in a manner such that 

15^ J individual data elements may be identified through data analysis. Specifically, the position of 

& 

1 individual data elements or the presence of delimiting characters within the semi-structured file may 

17 be used to identify and describe the structural characteristics of each individual data element. The 

1 8 organization of the data in this manner typically requires a painstaking process by the owner of the 

1 9 data during extraction. Unfortunately, the process of organizing the data held within each data file is 

20 time consuming and expensive. 
21 

22 
23 
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1 SUMMARY OF THE INVENTION 

2 [00081 Accordingly, the present invention provides a system and method of analyzing electronic 

3 data that eliminates the need for externally sourced descriptions, thus reducing the time and expense 

4 associated with manual creation of data file descriptive information. The present invention is 

5 capable of automatically analyzing one or more incoming data files, generating information 

6 descriptive of the structure of each data file and producing output data similar or identical in 

7 structure to the incoming data file(s) for use in subsequent ^plications. 
8 

9 BRIEF DESCRIPTION OF THE DRAWINGS 

1 (f 1 [0009] Figure 1 is a component diagram of one embodiment of the present invention. 

m 

■ ffi 

12=^ [0010] Figure 2 is a flowchart of the data analysis process for structured data files of one 

1^^ embodiment of the present invention. 

* 

lit 

1^1 [0011] Figure 2A is a flowchart illustrating a portion of the record break analysis process of one 

g- 

1 embodiment of the present invention. 
17 

18 [0012] Figure 2B is a flowchart illustrating a portion of tiie field break analysis process of one 

19 embodiment of the present invention. 
20 

21 [0013] Figure 3 is a flowchart of the data analysis process for semi-stinctured data files of one 

22 embodiment of the present invention. 
23 
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1 [00141 Figure 4 is an illustration of structured data file hierarchical representations of one 

2 embodiment of the present invention. 
3 

4 [00151 Figure 5 is an illustration of semi-structured data file hierarchical representations of one 

5 embodiment of the present invention. 
6 

7 [00161 Figure 6 is a flowchart illustrating a portion of the ou^ut generation process of one 

8 embodiment of the present invention. 
9 

ifi DETAILED DESCRIPTION OF THE INVENTION 

I W [00171 The present invention is herein described as a computer implemented method of analyzing 

firs 

l i i electi-bnic data, as a computer readable medium comprising a plurality of instructions for analyzing 

if^ computer intelligible electronic data and as a computer system for analyzing electronic data. 

Referring to the Figures, the present invention is capable of analyzing electronic data to determine 

l|t the structural characteristics of the data. The structural characteristics may then be used to generate 

1 1|, output data comprising a structural map of the incoming data for use in a variety of appUcations. 
17 

18 [0018] Referring to Fig. 1, the present invention is equipped with a processing unit (12) capable of 

19 reading and analyzing computer intelligible electronic data, as illustrated by Box (13). In one 

20 embodiment, the present invention provides a storage device (14) electrically coupled to the 

21 processing unit (12). In another embodiment, the present invention provides a user interface (1 5) 

22 through which the user may view and/or modify output data (1 6). In one embodiment of the present 

23 invention, only references to the source of output data (16) are stored within the storage device (14). 
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1 Specifically, the analyzed data files (20) themselves need not be stored, as illustrated by Box (1 1) of 

2 Figure 1, 
3 

4 [0019] The present invention is highly versatile and may be used with a variety of hardware 

5 platforms. For example, the present invention may be used with a host of personal computers and 

6 mid-range computer platforms (not shown). Platform specific code may be generated for Windows, 

7 Solaris, Linux, and Hewlett Packard HP-UX operating systems, if desired. 
8 

9 [00201 Any media type or envkonment supported by the operating system and hardware platform, 
1 whether local to the system or over a network, may be used by the present invention. For example, 

1 tJ direct access storage devices (DASD), write-once read-many devices (WORM), directly accessible 

li* tape and solid state devices (SSD), single or multiple read/write head, redundant array (RAID of any 

1 level), or jukebox subsystems may be utilized by the present invention. The present invention is 

s, 

l|* capable of efficient operation without the use of proprietary media formats, hidden partitions, or any 

1 f I other storage media preparation in addition to that required and/or supported by the operating system 

O 

1 |s, and hardware platform on which the present invention is installed, 
17 

1 8 [0021] The present invention is capable of efficiently analyzing incoming data (20) regardless of its 

19 type or structure. In one embodiment of the present invention, the term "data" is used to describe 

20 actual characters or values such as a name (e.g. John Smith) or date (e.g. 5/7/01) stored in a 

21 computer intelligible format. In another embodiment, data is accumulated into files (20). Files (20) 

22 may take tiie form of a computer data file, a computer application, or any data input stiream or data 

23 collection inti-oduced from an outside application or system. In one embodiment, these files (20) 



1 may be divided into records (22) comprising a physical or logical division of the file (20) into one or 

2 more sets of characters, hi another embodraient, individual data elements retaining some 

3 characteristic or value m addition to their simple character contents are referred to as a field (24). 

4 For example, "2001" could be classified by value, year, and/or street number fields, depending upon 

5 its intended use. 
6 

7 [00221 In one embodunent, the structure of the incommg data (20) is determined by analyzing the 

8 syntactic and semantic characteristics of the incoming data. In one embodiment, syntax refers to the 

9 physical characteristics of the fields (24) and/or records (22) present withm the incoming data files. 
For example, if a given field (24) contains the data "2001", syntax would include a length of four 

l || characters of the numeric type. Syntax may also include the field's position within the record as 
compared to other fields as well as the number of fields (24) in a record (22), the accumulated 

Mil 

iM character lengths of each field and record's position within an electronic data file (20) as compared 

1^^ to other records. Additionally, syntax may include the overall file size, the creation date, the last 
modified date and the number of records (22) in the file (20). Syntax may also be used as a 

PI 

Ip, vahdation test for data, as discussed below. 
17 

18 [00231 hi one embodiment, semantics refers to the attribute characteristic values of a field (24), 

19 record (22) or file (20). For example, is a given field (24) contains the data "2001", semantics may 

20 take the form of a "year" definition to describe the data and may also include definitions such as 

21 "Ordered Items" or "Shipping Information", depending on the type of data at issue. For any given 

22 data file (20), semantics may include broad definitions such as "Company X Purchase Order" or 



1 "XML Transaction Database". Semantic information is typicaUy provided by the user upon creation 

2 of the field (24), record (22) or file (20) at issue. 
3 

4 Data Analysis 

5 [0024] The present invention is capable of receiving and analyzing one or more incoming data files 

6 (20) to produce output data (16) capable of providing a concise description of the structural 

7 characteristics of the incoming data (20). Data files to be analyzed are collected through a reading 

8 process, as illustrated by Box (13) of Figure 1. In one embodiment, mcoming data files (20) are not 

9 imported or otherwise modified fi-om their original state but are simply read by the processing unit 
1(|3 (12) of the present invention for analysis. 

3 

If ^ [0025] Although the present invention is capable of reading and analyzing individual electronic data 

i P files (20), it may be advantageous to combine substantially similar data files (1 8) for simultaneous 

1 1* analysis. Data files may be substantially similar in content and/or structure in that the files have at 

f 1 ■ . ... 

ill least one common characteristic. For example, elechronic purchase orders and electromc mvoices 

1§ used by Company X, although distinct types of data files, may contain commonality. By analyzing 

1 7 similar files (1 8) simultaneously or as a continuous stream of data, the processing unit (12) of the 

1 8 present invention is capable of determining the structural characteristics of the electronic data files 

19 with greater accuracy. In short, a large number of files (18 and 20) having some degree of 

20 commonality will provide the system with additional examples of those possible structural 

21 configurations for llie files, thus refining the analysis process. 
22 
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1 [0026] Once similar data files (1 8 and 20), if any, have been grouped for reading the processing unit 

2 (12) of the present mvention is designed to automatically identify each electronic data field and its 

3 associated structural data. 
4 

5 [0027] The system, upon reading electronic data, identifies the file type associated with the 

6 incoming electronic data file(s) (20) as illustrated by Box (26) of Figures 2 and 3. In one 

7 embodiment, the present invention identifies the incoming data file (20) as havmg a structured, semi- 

8 structured or unstructured file type. In one embodiment, structured data refers to XML, EDI or other 

9 'tagged" data formats. In another embodiment, semi-structured data refers to ASCII, flat, positional 
1 or delimited file formats. 

m 

1 [0028] Referring to Figure 2, if the incoming data file has an explicitly named structure, the named 
if structure is used in conjunction with the incoming file (20) to break the file into records (22) and 

l|"« fields (24). Specifically, the processing unit (12) of the present invention determines which 

fi 

If J Structure type is associated with the incoming data, as illustrated by Box (28) of Figure 2. In one 

1 |J embodiment, the present invention maintains a library (not shown) to assist in identifying both the 

1 7 file type (26) and the structure type (28) of the incoming data file (20). To illustrate, an incoming 

1 8 XML file would first be identified by the present invention as having a structured file type (26). The 

1 9 present invention may then access the library to determine which structure type (28) is exhibited by 

20 the incoming file (20). In the present example, the processing unit (12) would determine that an 

2 1 XML structure type is utilized using information describing the attributes of XML files stored within 

22 the library. 
23 
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1 [0029] Structured data sources provide syntactical and semantic information regarding the incoming 

2 file (20) through the inherent structure representation of the file. Accordingly, a high percentage of 

3 the syntactical and semantic information for structured incoming data files is automatically captured 

4 and utilized by the present invention upon being read into the system (10). 
5 

6 [0030] Referring to Figs. 2 and 2A, once the file type (26) and structural type (28) have been 

7 determined, the processing unit (12) of the present invention analyzes the electronic data file (20) to 

8 identify the file's record break information, as illustrated by Box (30) of Figure 2. In one 

9 embodiment, the present invention analyzes the structured file to identify record break characters 
It J (32) typically used with the pre-determined file type. In one embodiment, record break information 
1 Q (30) comprises demarcated record break characters (32) and/or character counts. 

1 . 

. m. 

l§^' [0031] Utilizing the record break information (30) inherent within a structured file (20), the 

l|*l processing unit (12) parses the electronic data file (20) into one or more electronic data records (22), 

1 1 J Data records (22) having substantially similar attributes are identified and matched by the processing 

l^k unit (12), as illustrated by Box (34) of Figure 2. In one embodiment, data records (22) are matched 

17 by comparing syntactic information residing within the data file (20). For example, character counts 

1 8 and common headings found within the incoming data file (20) may be used to denote substantially 

1 9 similar records (22). 
20 

21 [0032] Referring to Figs. 2 and 2B, once individual records (22) have been identified, the processing 

22 unit (12) of the present invention analyzes each individual record (22) in order to identify field break 

23 information, as illustrated by Box (36) of Figure 2. In one embodiment, field break information 
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1 comprises demarcated field break characters (38) and/or transition points. The field break 

2 information (38) is utilized to parse data records (22) into individual data fields (24). The processing 

3 unit (12) of the present invention compares each individual data field (24) contained within 

4 previously matched records (22) to establish syntactic values for the entire data file (20), as 

5 illustrated by Box (62) of Figure 2. The data analysis process may be repeated as many times as is 

6 necessary to determine each individual data record (22), field (24) and element, within the incoming 

7 data file (20), as illustrated by Box (60) of Figure 2. 
8 

9 [0033] Referring to Figs. 2A and 3, the present invention is capable of determining the structural 

1 (£r characteristics of an incoming file (20) that has no expUcitly named structure. To accomphsh this, 

1 $1 the present invention first determines the file type (26) at issue. Data files (20) having no explicitly 

lil named structure are read into the system for data analysis. The processing imit (12) of the present 

I f 4 invention analyzes the incoming data file(s) (20) to identify record break information, as illustrated 

1 by Box (40) of Figure 3 . In one embodiment, record break information comprises one or more line 

u •■■ 

1 ^ termination characters (42) and/or character counts found within the data file (20). The record break 

1^^ information (40) is utilized to parse the electronic data file (20) into one or more electronic data 

17 records (22), as illustrated by Box (44). Data records (22) having substantially similar attributes are 

1 8 identified and matched by the processing unit (12) of the present invention. In one embodiment, 

19 data records (22) are matched by comparing syntactic information within the data file (20). For 

20 example, character counts and common headings found within the incoming data file may be used to 

21 denote substantially similar records, as illustrated by Box (44). 
22 
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1 [0034] Referring to Figs. 2B and 3, once individual records (22) have been identified, the processing 

2 unit (12) of the present invention analyzes each individual record (22) in orderto identify field break 

3 information (46). In one embodiment, field break information (46) comprises character type 

4 transitions and/or character counts, as illustrated by Box (48) of Figure 3. The field break 

5 information is utilized to parse data records into individual data fields. The processing unit (12) of 

6 the present invention compares each individual data field contained within previously matched 

7 records to ensure commonality between data fields, as illustrated by Box (50) of Figure 3. As with 

8 data analysis of structural data files, the data analysis process for semi-structured or unstructured 

9 files may be repeated as many times as is necessary to determine each individual data record (22), 
field (24) and element (23) within the incoming data file (20), as illustrated by Box (70). 

3- 

m 

1%A [0035] Referring to Figs. 2 and 3, once data analysis is completed, records and fields are identified 

lij across all incoming data, as illuslrated by Box (54). The structural patterns found during data 

ik^ analysis may then be used to create the output data (16) of the present invention. In one 

1 |i embodiment, the output data (16) created by the present invention contains a structural description of 

1 1? at least a portion of the analyzed data file(s) (20). In one embodiment, the present invention creates 

17 output data (16) describing the structural characteristics of the analyzed data file (20) or files as a 

18 whole. 
19 

20 [0036] Each output (16), as created by the present invention, provides a concise description of the 

21 analyzed data file(s) (20) providing the user with information about the analysis. This information 

22 includes the identification of record types present within the incoming file(s), the structure of each 

23 record, the sequence of record types, the cardinaUty of each record type, how records are grouped 
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1 together and whether the records are optional, required, vary in count, or repeat according to a 

2 discernible pattern. In one embodiment, output data (1 6) is converted into a pre-selected computer 

3 intelligible language (i.e., XML) that may then be stored within the storage device (14) for later use. 
4 

5 [0037] In one embodiment, the present invention uses tokenized symbology to denote the structure 

6 of the analyzed data (20) as represented by the output data ( 1 6) of the present invention. The present 

7 invention analyzes the inherent structure of the incoming data (20). In one embodiment, once the 

8 structure of an incoming data file (20) has been determined, each analyzed data jfile is tokenized such 

9 that each unique record (22) and field (24) is defined. Once tokenized, the structural characteristics 
1 of the analyzed data are used to assign a symbohc identifier to each structural component of the data 
I II file (20). Thus, the analyzed data (20) file represented by the output data (16) of the present 

m 

lik invention is assigned one or more tokenized symbols capable of symboUcally representing the 

1 f I Structural characteristics (16) of the analyzed data. 

l|l Data Organization 

1 |f [0038] Referring to Figures 4 and 5, the present invention is capable of describing the relationship 

17 amongst and between each data element (23) within incoming data files (20). Specifically, the 

18 present invention is capable of generating a hierarchical representation (52) of each record (22) and 

1 9 field (24) within an analyzed data file (20) . In one embodiment, each data element (23) is referred to 

20 as a node and each node is a direct descendent of a parent node. By identifying the parentage of each 

2 1 node, the present invention provides the user with a reference with which to determine and/or locate 

22 the source file firom which each data element belongs, as illustrated by Box (1 1) of Figure 1 . In this 

23 manner, each data element (23) has a defined position within the hierarchy as well as a defined 
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1 parentage, a specific organizational scheme and information from which the user may determine the 

2 data element's siblings, collections and/or children. For the purpose of illustration only, the 

3 following example of one of the naming conventions that may be used by the present invention is 

4 provided as follows. 
5 

6 [0039] In one embodiment, a multi-level decimal point notation system maybe used to identify each 

7 file (20), record (22) and field (24). The decimal point notation system may also be valuable in 

8 providing unique identification values for each data element (23). For example, if the value " 1 " is 

9 used to describe a file (20) and all of its content, the first record of this file may be labeled "1.1" 
l|| while the second record may be labeled "1 .2". Accordingly, the first field contained in the first 
1 13 record "1.1" would be labeled as " 1 . 1 . 1 " while the second field in the first record labeled as "1 .2. 1 ". 

ei ■ 

IB* 

1 il [0040] In some cases, the data analysis process may indicate that a particular field (24) or record's 

lH (22) presence is optional and/or repeating. For example, an optional field or record maybe designed 

Ip to carry an additional alpha character notation that is needed only in special cases. For example, 

igi given two fields witii node values of "1 .2.3A" and "1 .2.3B", tiie user and/or the processing unit (12) 

1 7 of the present invention may quickly determine that field "3" m record "2" of file "1 " may have two 

1 8 distinct, yet vahd, entries. 
19 

20 [0041] In one embodiment, the present invention utilizes two methodologies to describe tiie 

21 hierarchy of nodes. The first metiiodology employed by one embodiment of the present invention 

22 utihzes a structural description. To illustrate, an incoming file (20) may have several repeatmg 

23 records (22) or fields (24) that are not useful in describing the structural characteristics of the file, hi 
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1 this example, the structural node description is designed to limit the scope of the data analysis 

2 process to the basic structure of the file (20), thus excluding the repUcation patterns of the above 

3 mentioned records (22) and/or fields (24). In this example, the practical result is that the structural 

4 description expresses only a single record instead of a plurality of repeating records or fields that do 

5 not provide tiie system with structural information. 
6 

7 [0042] A second methodology employed by one embodiment of the present invention utilizes a data 

8 sample node description. In one embodiment, the data sample node description displays each and 

9 every data element (23) within the selected file (20), without regard to repetition or redundancy. 
1 1 J Specifically, this second methodology uses additional values m the node description to indicate the 
1 11 iteration number of any repeating records (22) or fields (24). For example, node vahie of "(1- 
lift 1)1.2.1" and "(1-2)1.2.1" would represent first and second iterations of a repeating field with the 
ill node value of "1.2.1". In this example, "1.2.1" serves as a link between the sunple structural 

s 

ll'j description and the data sample description. 

lit 

••■s| 

l^k Modification 

17 [0043] Referring to Figs. 1, 4 and 5, a user interface (15) is provided by the present invention to 

1 8 allow the display of both the hierarchical representation (52) of the structural characteristics of the 

19 analyzed data file (20) and the specific semantic and syntactical information for each file, record 

20 (22) and field (24). In one embodiment, the hierarchical representation (52) is expressed as an 

21 expansion "tree" wherein the "root" denotes the file, the "branches" denote the records and the 

22 "leaves" denote the fields. The expansion tree is suited to express the nodes and may take advantage 
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1 of the node methodologies described above. In one embodiment, node values describe the 

2 root/branch/leaf construct through the multi-level decimal point methodology. 
3 

4 [00441 In another embodiment, a detailed collection of tables is used to list the structural 

5 characteristics that may be reviewed and/or edited by the user. These values include, but are not 

6 limited to: 

7 - Minimum / Maximum Length 

8 - Minimum / Maximxmi Value 

9 - Justification 
1 - Format 

1|3 - Modified (True/False) 

yd 

li^ - Mandatory (True/False) 

if ^ - Type (Alpha, Numeric, etc.) 

li s - User-Defined Name 

Q ■ 

f1 

1 [0045] The present invention allows the user to view the structural characteristics gleaned fi:om the 

1 7 data analysis process and then modify same to achieve the proper results. Changes can be made to 

1 8 individual records (22) or fields (24) by the user, as desired. 
19 

20 [0046] In one embodiment, the user interface (15) of the present invention is designed to limit the 

21 displayed fields (24) to only those of the same type. For example, in a semi-structured file of 

22 addresses, one field typically denotes "CITY". The display of this semi-structured file may be 

23 filtered to only the "CITY" data field, thus allowing the user to review the structural characteristics 
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4 



1 for that field (24). This feature of the present invention helps the user ensure that the automated data 

2 analysis process accurately identifies the "CITY" field elements across all records. In another 

3 embodiment, the list of "CITY" field elements is sorted alphabetically to allow for faster searching 

4 and review by the user. 
5 

6 Output 

7 [00471 Referring to Figs. 1 , 2, 3 and 6, the present invention is capable of providing three types of 

8 output data. The first type of output data (16TD) employed in one embodiment of the present 

9 invention being output data designed for use by other applications intending to use data files (20) 
Ig analyzed by the present invention for their own input. The second type of output data (16T) 

o 

|i employed in one embodiment of the present invention being designed for use in generating different 

m 

M versions of one or more incoming data files (20). The third type of output data (16) employed in one 

'42 

lI J embodiment of the present invention takes the form of generated data that describes the entire file 

and the data elements and structure. 

iP •■ 

^■^J ■ ■ 

I 

4, [0048] In one embodiment, the first type of output data (16TD) of the present invention may be 

1 7 converted to an XML document as described above. The XML document is then used to parse and 

18 translate documents being fed into another system, such as an electronic commerce system. 

19 Structural characteristics are expressed within the XML document using normaUzed XML values 

20 and expressions to enable them to be read and utilized by any system capable of reading and 

21 processing an XML document. In short, the data analysis process of the present invention is used as 

22 a parse command generator, thus enabling a subsequent user to describe an incoming data file (20) to 

23 an external system. 
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1 

2 [0049] Referring to Figure 6, in one embodiment, the second type of output data (16T) is a user 

3 maaaged collection of output data (16) capable of matching the original incoming data file (20) with 

4 the exception of one or more specific data fields (24A) containing user defined modifications (80). 

5 This collection of output data may then be used to evaluate how the system reacts to these minor 

6 changes. The present invention is data-centric in that it introduces no outside influences or 

7 presumptions to the generations of the collection of output data (1 6T). Specifically, all output data 

8 (16T) produced by the processing unit (12) of the present invention is sourced firom the analyzed 

9 data files (20) and varies only according to specific modification information (80) as supplied by the 

l|j user. 

0 
10 

1 [0050] The present invention is capable of using the structural characteristics of analyzed files (20) 

1 f J to create a plurality of output data (1 6T) identical to the analyzed data file (20) with the exception of 

1 |l modified fields. Through the user interface ( 1 5), the user is given the opportunity to select specific 

m 

l\i values (54) that will be used within each individual field (24) within the ou^ut data. In one 

1^2 embodiment, the resulting output data (16T) utilize the original values of the incoming data file(s) 

1 7 for all records (22) and fields (24) with the exception of a predetermined data field (24A). The value 

1 8 (54) of this predetermined field (24A) is entered by the user as a modification instruction (80). In 

19 one embodiment, if the modified field (24A) has more than one value (54) entered by the user, the 

20 next output, or set of output, will use the second value (54B) fi-om the user's entry, then the third 

21 (54C), etc., until all of the user's values (54) have been used to produce output data (16). 
22 
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1 [00511 Modification information (80) may take virtually any type (i.e., alpha, numeric, time, date, 

2 etc.) or format as desired by the user. The present invention allows for the generation of output data 

3 (16) that can purposefully fail, and through this failure, invoke additional error handling 

4 programming. Accordingly, the present invention does not attempt to evaluate the impact (hkely 

5 success or failure of subsequent processing) of the user' s modification. For example, it is possible 

6 for the user to place characters into a numeric field for the purpose of causing expected error when 

7 the output data(s) is parsed during subsequent processing. In one embodiment, the modification of 

8 one or more fields (24) by the user, accumulated across substantially all fields in an analyzed data 

9 file (20), may be used to generate output data (16) differentiated according to the modified field 
iC (24A) only. To provide the user with convenient entry of modification information (80), one 

s 

1 li embodiment of the present invention provides Field Increment Value Setting (FIVS). 

1 1 J [0052] The FIVS process of the present invention allows the user to manage numerical range values 

1 ^ which describe some collection of values for user modifications. For example, the user value " 1 -5" 

. W 

1 1't is equal to entering user values of "1, 2, 3, 4, 5", and will generate five output data files (16T). To 

1 6^ further illustrate, "1-1 000" will generate one thousand sets of output data, differing according to the 

17 user's modification instructions one thousand times. In addition to simple ranging, the present 

18 invention allows the user to provide step increments as well. For example, the user value "0-4; 2" 

1 9 would be interpreted to be equivalent to entering "0, 2, 4" since the number following the semi-colon 

20 describes the size of the step increment. Step increments may also be sub-integer in value, where "0- 

21 4;.5" would be interpreted as entering "0, .5, 1, L5, 2, 2.5, 3, 3.5, 4". 
22 
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1 [00531 In another embodiment, data formatting is also provided within the FIVS process. For 

2 example, the user value "5-7: 000" is interpreted to be equivalent to entering user values of "005, 

3 006, 007" where the characters following the colon describe the character filled format of the output 

4 field value. Additionally, other data may be prepended or appended to the ranged values in an FIVS 

5 command. For example, the user value "FIRST_"1-3"_LAST will output FIRST_1_LAST, 

6 FIRST_2_LAST and FIRST_3_LAST. In one embodiment of the present invention, a special type 

7 of FIVS value is provided for use in management of mandatory unique identifiers for each set of 

8 output data (16), hereinafter referred to as the Trans-Session Unique Naming System (T-SUNS). 
9 

1(|J [0054] T-SUNS allows the user to designate which field(s) (24) will be modified for each set of 

O 

1|5 output data (16). T-SUNS is an improvement upon FIVS. Specifically, FIVS is capable of 

lili, understanding single fields (24) and the set or range of values (54) that the user intends for FIVS to 

1 JJ place in modified fields (24A) during output data (16) generation. However, output data (16) is 

l^k typically used for processing within an extemal system. This means that the extemal system may 

1 $1 require each document to be equipped with a unique identification number. 

1 7 [0055] T-SUNS accommodates this requirement through a triggering command (not shown) entered 

18 by the user during the FIVS process for a predetermined data field or fields. In one embodiment, the 

19 triggering command instructs the system to insert a new value for the field (24) or fields having an 

20 attached triggering command. This new value may then be used as a unique identifier to not only the 

21 modified data fields but also for the analyzed data fi-om which the output data (16) is created. The 

22 present invention uses structural information to accompUsh this. By maintaining information 

23 regarding the structural characteristics of the field(s) (24) to which a unique identification has been 
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1 assigned, the present invention is capable of generating the unique number when required for any 

2 and all output data (16) containing fields (24) with an attached triggering command. 
3 

4 [0056] In one embodiment, the unique identification number consists of a preamble, amble and 

5 postamble. The present invention allows the user to edit the identification number as desired. In one 

6 embodiment, the user may edit the preamble and postambles as desired as well as set Hie starting 

7 value and format for the amble portion. For example, a typical use for unique identification numbers 

8 is for purchase orders. This type of document typically uses numeric counters surrounded by 

9 alphabetic characters and formatted with leading O's. To illustrate, a purchase order number of 
1 ft XYZOOOlPO may be managed within the T-SUNS process as XYZ for the preamble, 1 :0000 as the 
1 Is amble and PO as the postamble. Starting with these values, T-SUNS would return XYZOOOlPO the 
li* first time, XYZ0002PO the second time, and so on. Since the structural characteristics are used for 
1 li the preamble, amble and postamble, subsequent output data (1 6), even if produced subsequently, are 

capable of using the next available increment of the identification number as long as the original 

1 11 output data's (16) structural characteristics still apply. 

n- 

17 [0057] In addition to the above, the present invention is capable of automatically modifying output 

18 data (16) through the use of the Field Listance Bounds Generation (FIBG) process. Unlike FIVS 

1 9 where the user enters all data modification instructions (80), FIBG uses the structural characteristics 

20 of the output data (1 6) to determine the boundaries of valid data for selected fields (24A). Once this 

21 is accomplished, the FIBG process of the present invention "pushes" the boundaries. For example, 

22 FIBG is capable of producing output data (1 6) based on the following boimdaries. 
23 
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1 [0058] A first boundary employed by one embodiment of the present invention is referred to as 

2 minimmn minus. Minimvim minus tests use the structural characteristics of the output data (1 6) to 

3 determine the minimum value for a given field (24). In one embodiment, output data (1 6) includes 

4 a first value that meets the predetermined minimum value as well as a second value that decrements 

5 the fijst value by a factor of one. Thus, the present invention may presume that ou^ut data (16) 

6 having the first value which meets the predetermined minimum value will pass subsequent 

7 processing (e.g., a positive test) while the output data (16) having the second value that does not 

8 meet the predetermined minimum value with not pass subsequent processing (e.g., a negative test). 

9 M one embodiment, the presumption of success or failure is indicated in the naming convention for 
1 pI each set of output data ( 1 6) to provide the user with simplified review and identification of each set 

l|^ of output data. 

■ |3 . 

1 S J [0059] A second boundary employed by one embodiment of the present invention is referred to as 

l ¥ maximum plus. Maximum plus uses the structural characteristics to determine the maximum value 

l|i in order to output one data set for each field (24) that meets the maximum plus value, as well as one 

1^ data set for each field that exceed the maximum plus value. In one embodiment, the presumption of 

17 success or failure is indicated in the naming convention for each set of output data (16) to provide 

1 8 the user with simpHfied review and identification of each set of output data. 
19 

20 [0060] A third boundary employed by one embodiment of the present invention is referred to as 

2 1 blank field. Using this boundary, blank spaces, instead of alphanumeric characters, are used for the 

22 predetermined field. A fourth boundary employed by one embodiment of the present invention is 

23 referred to as field type. Field type uses the structural characteristics of the field type at issue so that 
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1 each field may be varied. For example, typical field types include alpha, numeric and alphanumeric. 

2 Fields marked for this type of output data generation will produce three data sets, one with all alpha 

3 chm-acters, one with only numeric characters, and one with mixed alphanumeric characters. In one 

4 embodiment, the presumption of success or failure is based on the field type at issue. 
5 

6 [0061] A fifth boundary employed by one embodiment of the present invention is referred to as 

7 decimal count Decimal count is utiUzed primarily in conjunction with fields (24) that indicate a 

8 numeric type or have a predefined decimal format. FIBG output data (1 6) having these fields will 

9 increment and decrement the decimal position for the field's data. The presumption of success or 
ifi failure is based on the existing decimal format for the field, with any matching format assuming 
1 1| success, and aay deviation firom the standard format assuming failure. 

m 
# 

liJ [0062] Although the invention has been described with reference to specific embodiments, this 

% . 

description is not meant to be construed in a limited sense. Various modifications of the disclosed 

1 11 embodiments, as well as alternative embodiments of the inventions will become apparent to persons 

n 

1 §4 skilled in the art upon the reference to the description of the invention. It is, therefore, contemplated 

1 7 that the appended claims will cover such modifications that fall within the scope of the invention. 
18 



-22- 



